Hierarchy Identification for Automatically Generating Table-of-Contents
نویسندگان
چکیده
A table-of-contents (TOC) provides a quick reference to a document’s content and structure. We present the first study on identifying the hierarchical structure for automatically generating a TOC using only textual features instead of structural hints e.g. from HTML-tags. We create two new datasets to evaluate our approaches for hierarchy identification. We find that our algorithm performs on a level that is sufficient for a fully automated system. For documents without given segment titles, we extend our work by automatically generating segment titles. We make the datasets and our experimental framework publicly available in order to foster future research in TOC generation.
منابع مشابه
Visualizing websites using a hierarchical table of contents browser: WebTOC
A method is described for visualizing the contents of a Web site with a hierarchical table of contents using a Java program and applet called WebTOC. The automatically generated expand/contract table of contents provides graphical information indicating the number of elements in branches of the hierarchy as well as individual and cumulative sizes. Color can be used to represent another attribut...
متن کاملRFID-based decision support within maintenance management of urban tunnel systems
Efficiently, tracking information related to components, materials and equipment from the production/construction phase to operation and maintenance is a challenge in the industries. The industry environment is a natural fit for generating and utilizing instance-level data for decision support. Advanced electronic identification and data storage technologies e.g. radio frequency identification ...
متن کاملRFID-based decision support within maintenance management of urban tunnel systems
Efficiently, tracking information related to components, materials and equipment from the production/construction phase to operation and maintenance is a challenge in the industries. The industry environment is a natural fit for generating and utilizing instance-level data for decision support. Advanced electronic identification and data storage technologies e.g. radio frequency identification ...
متن کاملA Semi-supervised Approach for Generating a Table-of-Contents
This paper presents a semi-supervised model for generating a table-of-contents as an indicative summarization. We mainly focus on using word cluster-based information derived from a large amount of unannotated data by an unsupervised algorithm. We integrate word cluster-based features into a discriminative structured learning model, and show that our approach not only increases the quality of t...
متن کاملIdentification and Prioritization of Factors Affecting E-teacher’s Performance based on Fuzzy Analytic Hierarchy Process (AHP)
Introduction: Information and communication technology has changed the traditional role of teachers and learners. It’s important to detect factors influencing e-teachers’ role to improve their performance more than ever. So in this study we identified and prioritized factors influencing e-teacher’s effective performance. Methods: In this descriptive study, 15 e-learning experts from Tehran Uni...
متن کامل